Schema Extraction from XML Data: A Grammatical Inference Approach

نویسنده

  • Boris Chidlovskii
چکیده

New XML schema languages have been recently proposed to replace Document Type Definitions (DTDs) as schema mechanism for XML data. These languages consistently combine grammar-based constructions with constraintand pattern-based ones and have a better expressive power than DTDs. As schema remain optional for XML data, we address the problem of schema extraction from XML data. We model the XML schema as extended context-free grammars and propose the schema extraction algorithm that is based on methods of grammatical inference. The extraction algorithm copes also with the schema determinism requirement imposed by XML DTDs and XML Schema languages. We report the tests result of schema extraction on a collection of real XML documents.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Schema-Guided XML Query Induction

XML query induction is a key task in Web information extraction. Recent approaches based on grammatical inference represent node selection queries in XML trees by deterministic tree automata. In this paper, we show how to guide RPNI-based learning algorithms by XML schemas which we can infer in a preprocessing step. We hope that schema guidance will help to improve heuristics that are essential...

متن کامل

On Structural Inference for XML Data

Semistructured data presents many challenges, mainly due to its lack of a strict schema. These challenges are further magnified when large amounts of data are gathered from heterogeneous sources. We address this by investigation and development of methods to automatically infer structural information from example data. Using XML as a reference format, we approach the schema generation problem b...

متن کامل

Effective Structural Inference for Large XML Documents

This paper investigates methods to automatically infer structural information from large XML documents. Using XML as a reference format, we approach the schema generation problem by application of inductive inference theory. In doing so, we review and extend results relating to the search spaces of grammatical inferences for large data set. We evaluate the result of an inference process using t...

متن کامل

An Algorithm for Automatic Inference of Referential Integrities During Translation from Relational Database to XML Schema

XML is rapidly becoming one of the most widely adopted technologies for information exchange and representation on the World Wide Web. However, the large part of data is still stored in a relational database and we need to convert those relational data into an XML document. There are existing approaches such as NeT and CoT to convert relational models to XML models but those approaches only con...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001